Research & Resources

Research & Resources

Research Publications


Pearson's research publications for educators, parents, students, researchers and policy makers. Sort by Title, Author, and Date. Click any document below to view in PDF format.

Get Acrobat Reader

The following research papers require the use of Adobe® Acrobat® in order to view and/or print them. If you do not have Acrobat installed on your computer, it can be downloaded free of charge from the Adobe Web site. Just click on the button to the right.

Search In:

TitleAbstractAuthor(s)Date
Performance of Ability Estimation Methods for Writing Assessments under Conditions of MultidimensionalityAn increasing number of large scale assessments contain constructed response items such as essays for the advantages they offer over traditional multiple-choice measures. Writing assessments in particular often contain a mixture of multiple-choice and essay items. These mixed-format assessments pose many technical challenges for psychometricians. This study directly builds upon the Meyers et al. (2009) study by investigating how ability estimation, essay scoring approach, measurement model, and proportion of points allocated to multiple choice items and the essay item on mixed-format assessments interact to recover ability and item parameter estimates under different degrees of multidimensionality.Meyers, Jason L.
Turhan, Ahmet
Fitzpatrick, Steven J.
05-2010
What Item Writers Think When Writing Items: Towards A Theory OF Item Writing ExpertiseThe study of expert item writers offers the possibility of “bottling” the knowledge and skills acquired by these experts over years of hard work. The descriptions of the identified conceptual knowledge and skills of expert item writers could be incorporated into item writing workshops in order to equip new item writers with the tools necessary to produce quality figural response items.Fulkerson, Dennis
Nichols, Paul
Mittelholtz, David
05-2010
Running Head: Predicting ELP A Multi-level Modeling Approach to Predicting Performance on a State ELA AssessmentThe purpose of this study was to examine on a State English Language Proficiency Examination for grades K-12 (a) the performance of students in low SES environments vs. high SES environments as measured by school Title I participation, (b) the performance of males vs. females, (c) the effect of ethnicity( Hispanic vs. non-Hispanic students), and (d) any interaction effects.Brown, Raymond S.
Nguyen, T.
Stephenson, A.
05-2010
Comparisons of Test Characteristic Curve Alignment Criteria of the Anchor Set and the Total Test: Maintaining Test Scale and Impacts on Student PerformanceThe current paper investigates a tenet of the traditional view on the psychometric characteristics of such anchor sets. Specifically, the traditional guideline, without any specificity, states that the test characteristic curve (TCC) of the anchor set and the total test should be closely overlapped.Karkee, Thakur B., Ph. D
Fatica, Kevin
Murphy, Stephen T., Ph. D.
05-2010
Running Head: IMPACT OF DIFFERENT ANCHOR STABILITY METHODS The Impact of Different Anchor Stability Methods on Equating Results and Student PerformanceThe key objective of this study is to demonstrate a methodological procedure or strategy for examining the different anchor stability procedures and the accompanying results and to evaluate the impact on the final RSSS tables and reported cut scores (i.e., performance levels). For our study we did not include the bivariate plots for the old and new parameter values.Murphy, Stephen
Little, Ian
Fan, Meichu
Lin, Chow-Hong
Kirkpatrick, Rob
05-2010
Improving the Post-Smoothing of Test Norms with Kernel SmoothingThe traditional methodology of apost-smoothing to develop norms used on educational and clinic products is to hand-smooth the scale scores or their distributions. This approach is very subjective, difficult to replicate, and extremely labor intensive. In hand-smoothing, the scores or distributions are adjusted based on personal judgment. Different persons, or same person at different times, will make significantly different judgments. By contrast, the kernel smoothing method is a nonparametric approach, which is more flexible, less subjective, and easier to replicate.Lin, Anli
Yi, Qing
Young, Michael J.
05-2010
The Modified Briefing Book Standard Setting Method: Using Validity Data as a Basis for Setting Cut ScoresThis paper focuses on two aspects of the modified briefing book standard setting process developed to meet this need: 1) the validity research conducted to support the standard setting; and 2) the standard setting itself, through which the validity research and associated pertinent information was organized and presented to the panelists, and resulting process through which these data were used to elicit cut score judgments.Miles, Julie A.
Beimers, Jennifer N.
Way, Walter D.
05-2010
Impact of Non-representative Anchor Items on Scale StabilityThis study attempts to fill this gap by simulating item response data over multiple administrations under the common-item nonequivalent groups design and examining the effects of non-representative anchor items on scale stability.Wei, Hua05-2010
Rater Effects as a Function of Rater Training ContextThis study examined the influence of rater training and scoring context on the manifestation of rater effects in a group of trained raters.Wolfe, Edward W.
McVay, Aaron
05-2010
The Hazards of Newness: A Portrait of Challenges Faced by New High School English TeachersThis paper reports findings of a survey study designed to examine how high school English teachers are assigned to teach particular grades and track levels, whether these teachers have their own classrooms, and how they and their students perceive one another.Bieler, Deborah
Holmes, Stephen
Wolfe, Edward W.
05-2010
IRT Proficiency Estimators and Their ImpactIn the current study, we further examined the statistical properties of the various IRT estimators, especially focusing on their practical impact on the reported scores. We 4 also investigated a few practical scenarios, where the testing focus is on assessing college readiness, assessing students’ minimal competency, or providing estimates for students who have failed a previous exam (retesters).Tong, Ye
Kolen, Michael J.
05-2010
Correlates of Mathematics Achievement in Developed and Developing Countries: An HLM Analysis of TIMSS 2003 Eighth-grade Mathematics ScoresThe purpose of this study was to investigate correlates of math achievement in both developed and developing countries. Specifically, two developed countries and two developing countries that participated in the TIMSS 2003 eighth-grade math assessment were selected for this study. For each country, contextual factors at both the student and the teacher/school levels were used to construct Correlates of Math Achievement 3 models that yield country-specific findings related to students’ math performance.Phan, Ha
Sentovich, Christina
Kromrey, Jeffrey
Dedrick, Robert
Ferron, John
05-2010
AutoCorreleation in the COFM. The effects of Autocorrelation on the Curve-of-factors Growth ModelThis simulation study examined the performance of the curve-of-factors model (COFM) when autocorrelation and grwth processes were present in the first-level factor sturcture. In addition to the standard curve-of-factors growth model, two new models were examined: one COFM that included a first-order autoagressive atuocorrelation parameter, and a second model that included first-order autoregressive and voving average autocorrelation parameters.Murphy, Daniel J.
Beretvas, S Natasha
Pituch, Keenan A
05-2010
Distractor Rationale Taxonomy: Diagnostic Assessment of Reading with Ordered Multiple-Choice ItemsThe distractor rataionale taxonomy (DRT) examined in this study is an understanding-level-driven distractor analysis system for multiple-choice items. The DRT purposely creates distrators at different comprehension levels to pinpoint sources of misunderstanding.Lin, Jie
Lee Chu, Kwang
Meng, Ying
05-2010
Investigating Approaches to Estimate an Individual's Strand/objective Score Profile Reliability: A Monte Carlo StudyThe paper studies performance of generalizability and classical test theory reliability approaches to estimate reliability of an individual's strand/objective score profile.Arce-Ferrer, Alvaro J.05-2010
Deriviation of a Profile Reliability Index for an Individual: A Multi-Factor Congeneric Approach with Guttnam Error Type StructuresThe paper discusses results and proposes research to substantiate current supporting evidenc for the operational use of the profile reliability approachArce-Ferrer, Alvaro J.11-2009
Growth, Precision, and CAT: An Examination of Gain Score Conditional SEMMeasurement of student growth is an important topic for K-12 state testing programs, both in terms of school accountability as well as for reporting progress of individual students.Thompson, Tony D.06-2008
Effects of Different Training and Scoring Approaches on Human Constructed Response ScoringThis paper summarizes and discusses research studies related to the human scoring of constructed response items that have been conducted recently at a large scale testing company.Nichols, Paul
Vickers, Daisy
Way, Walter D.
04-2008
Person-fit of English Language Learners (ELL) in K-12 High-Stakes AssessmentsThe No Child Left Behind Act holds states using federal funds accountable for student academic achievement.Wan, Lei
Wu, Brad
04-2008
User-Centered Assessment DesignIn this paper, we introduce user-centered assessment design (UCAD), an approach to test design intended to produce assessments that deliver to teachers the kind of complex information on student learning and knowledge that they can combine with sound pedagogical practice to improve student achievement.Adams, Jeremy
Mittelholtz, David
Nichols, Paul
Van Duesen, Robert
03-2008
A Tale of Two Modes: A Case Study in User-centered Design’s Role in Comparability and Construct ValidityIntroduction: UCD’s Role within User-centered Assessment Design One merit of user-centered assessment design (UCAD) as defined by Nichols et al (2008) is its broadened view of test development.Strain-Seymour, Ellen, PhD03-2008
Usability and Design Considerations for Computer-based Learning and AssessmentThe overall success of computer-based products and systems is dependent to a significant extent on their usability and usefulness in the intended context.Adams, Jeremy
Harms, Michael
03-2008
Field Testing and Equating Designs for State Educational AssessmentsThe educational accountability movement has spawned unprecedented numbers of new assessments. For example, the No Child Left Behind Act of 2002 (NCLB) required states to test students in grades 3 through 8 and at one grade in high school each year.Kirkpatrick, Rob
Way, Walter D.
03-2008
An Investigation of the Changes in Item Parameter Estimates for Items Re-field TestedLarge-scale state testing programs typically rely upon a large bank of items to select from when building assessments.Kong, Xiaojing Jadie
McClarty, Katie Larsen
Meyers, Jason L.
03-2008
A Comparison of Pre-Equating and Post-Equating Using Large-Scale Assessment DataEquating is a statistical process that is used to adjust scores on test forms so that scores on the forms can be used interchangeably (Kolen & Brennan, 2004), even though the test forms consist of different items.Tong, Ye
Wu, Sz-Shyan
Xu, Ming
03-2008
Maintenance of Vertical ScalesVertical scaling refers to the process of placing scores of tests that measure similar domains but at different educational levels onto a common scale, a vertical scale.Kolen, Michael J.
Ye, Tong
03-2008
Evidence of Test Score Use in Validity: Roles and ResponsibilitesThis paper has three goals.Nichols, Paul D.
Williams, Natasha
03-2008
Score Reporting, Off-the-Shelf Assessments and NCLB: Truly and Unholy TrinityOne consequence resulting from NCLB, particularly as instructional time becomes more precious, is the desire to be more efficient in assessing learning.Twing, Jon S., PhD03-2008
Applying a User-Centered Design Approach to Data Management: Paer and Computer TestingThis paper discusses the application of a user-centered design (UCD) approach to a web-based application system that supports data management components of the high-stakes assessment lifecycle.Wilson, Jeffrey R., PhD03-2008
Exploring the Use of Item Bank Information to Improve IRT Item Parameter EstimationOn occasion, the sample of students available for calibrating a set of assessment items may not be optimal.Ansley, Timothy
Hall, Erika
 
A Comparison of Item and Testlet Selection Procedures in Computerized Adaptive TestingTestlet response theory (TRT) is a measurement model that can capture local dependency in testlet-based tests.Chen, Tzu-An Ann
Dodd, Barbara G.
Ho, Tsung-Han
Keng, Leslie
 
Response Probability Criterion and Subgroup PerformanceIn the standard setting literature, there has been much debate about the most appropriate response probability (RP) to use in an item mapping procedure such as the Bookmark Standard Setting Procedure.Egan, Karla
Mueller, Canda D.
Schneider, M. Christina
 
A Generalization of Stratified α that Allows for Correlated Measurement Errors between SubtestsThis paper presents a generalization of Stratified α that allows for correlated measurement errors between some subtest scores that make up a composite score.Keng, Leslie
Miller, G. Edward
O'Malley, Kimberly
Turhan, Ahmet